Final Project: What affects suicide rates in the United States?

    In the United States, there has been an increase in mental health concerns, whether it stems from the over-prescribing of opioids, mental instability of the youth who commit horrible acts, stress from school, depression from societal expectations, etc. it is a serious issue and it can often lead to suicide or self-harm. The US should begin study this issue and strive to implement a set of policies that help people with their anxieties or other mental health issues and offer easier access to mental health care, such therapists or psychologists or possibly medication. Recently, there have been issues with mental health at UMD, sparking movements to get students to try to change the university’s policies/efforts. Hypotheis: I believe that varaibles such as Access to Mental Health Resources, Depression and Drug Usage by state all have effects to predicting the suicide rate of a state.

What is the narrative at UMD around mental health?

Recent University of Maryland Diamondback articles expressing the concern for UMD’s disregard for Mental health issues faced by students and how the university health center had a backup of over 30 days, which can lead to even worse mental health issues.

mental <- readtext('mental.txt')
mental <- paste(mental$text, collapse=" ")
mental <- VectorSource(mental)
corpus <- Corpus(mental)
corpus <- tm_map(corpus, content_transformer(tolower))
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, stripWhitespace)
corpus <- tm_map(corpus, removeWords, stopwords("english"))
corpus <- tm_map(corpus, removeWords, c("student","_","-","“","","201516","201617","kirklandgordon",'schledwitz'))
#creating matrix
dtm <- DocumentTermMatrix(corpus)
dtm <- as.matrix(dtm)
frequency <- colSums(dtm)
frequency <- sort(frequency, decreasing=TRUE)
#5 Most common words:
head(frequency)
##   students     mental counseling university     center     health 
##         60         42         39         36         32         31
words <- names(frequency)
#png("wordcloud_packages.png", width=15,height=15, units='in', res=300)
#wordcloud(words, scale=c(5,0.5), max.words=100, random.order=FALSE, rot.per=0.35, use.r.layout=FALSE, colors=brewer.pal(8, "Dark2"))
wordcloud(words[1:120], frequency[1:120],color=brewer.pal(8, "Dark2"))

Variable Selection:

Dependent Variable: Suicide Rate

The suicide rate variable comes from 2016 CDC report, which calculates the mortality rate calculated per the number of deaths per 100,000 total population. The higher the suicide rate, the more people (percentage wise) chose to end their lives.

mean(data$Suicide.Rate) #Suicide Rate
## [1] 15.792

Source https://www.cdc.gov/nchs/pressroom/sosmap/suicide-mortality/suicide.htm

Independent Variables

Depression

The depression reporting variable data comes from a 2010 CDC report regarding current depression among adults, with the higher percentages representing high rates of any overall depression in that state.

mean(data$Depression.Rates) #Depression
## [1] 7.752

Source https://www.cdc.gov/mmwr/preview/mmwrhtml/mm5938a2.htm?s_cid=mm5938a2_w#tab1

Drug Usage

The drug use variable is data from the U.S. Census Bureau, with data included in the total measure consists of “Percentage of Teenagers Who Used Illicit Drugs in the Past Month”, “Number of Opioid Pain Reliever Prescriptions per 100 People”,“Percentage of Adults Who Used Illicit Drugs in the Past Month” and numerous others to classify these states. The higher score indicates a higher prevalence of drug usage.

mean(data$Drug.Use)#Drug Usage
## [1] 41.8912

Source https://wallethub.com/edu/drug-use-by-state/35150/

Access to Mental Health Resources

The access to mental health resources variable comes from 2014 data collected by the Mental Health America organization, and classify the state by the overall ranking and the access to care ranking. A high overall (such as 1-10) ranking indicates a lower prevalence of mental illness and higher rates of access to care. A low overall ranking indicates a higher prevalence of mental illness and lower rates of access to care. The combined scores of all 15 measures make up the overall ranking. The overall ranking includes both adult and youth measures as well as prevalence and access to care measures.

mean(data$Access.to.care)#Access to Menal Care
## [1] 25.5

Source http://www.mentalhealthamerica.net/issues/2017-state-mental-health-america-ranking-states

#Mapping
#states_map<-map_data("states") #run this in the command line, wont work outside of it for some reason. 
data <-read.csv("finalprojectdata.csv")
data <- data.frame(data[,-1], row.names=data[,1])
states_map<-map_data("state")
#creating a seperate data variable for the depression rates map that is missing data. 
dataDepresionMap <-data[which(data$Depression.Rates!="0"), ]
dataDepresionMap<-merge(states_map, dataDepresionMap, by.x="region", by.y="state")
dataDepresionMap<-arrange(dataDepresionMap, group, order)
#Merging State Data
dataMap<-merge(states_map, data, by.x="region", by.y="state")
dataMap<-arrange(dataMap, group, order)
#Suicide Rate Map
map<-ggplot(dataMap, aes(x=long, y=lat, group=group, fill=Suicide.Rate,  locationmode = dataMap$region)) +
  geom_polygon(color = "black") +
  scale_fill_gradient2(low="blue", mid="grey88", high="magenta", midpoint=median(data$Suicide.Rate))+
  expand_limits(x= states_map$long, y=states_map$lat) + 
  ggtitle("Suicide Rate by State", subtitle = NULL) + 
  labs(caption = "Source: CDC 2016") + 
  labs(fill='Rate%')  + coord_map("bonne",  param=25) 
map_map <- map+theme_fivethirtyeight()
#ggplotly(map)
#Access to Care Map
map1<-ggplot(dataMap, aes(x=long, y=lat, group=group, fill=Access.to.care)) +
  geom_polygon(color = "black") +
  scale_fill_gradient2(low="blue", mid="grey88", high="magenta", midpoint=median(data$Access.to.care))+
  expand_limits(x= states_map$long, y=states_map$lat) +
  coord_map("polyconic")+ggtitle("Access to Mental Health Care", subtitle = " Greater = less access")+labs(caption = "Source: Mental Health America 2014") + labs(fill=NULL)  + coord_map("bonne",  param=45) 
#map1
#Depression Rate Map
map2<-ggplot(dataDepresionMap, aes(x=long, y=lat, group=group, fill=Depression.Rates)) +
  geom_polygon(color = "black") +scale_fill_gradient2(low="blue", mid="grey88", high="magenta", midpoint=median(dataDepresionMap$Depression.Rates))+
  expand_limits(x= states_map$long, y=states_map$lat) +
  coord_map("polyconic")+ggtitle("Any Depression Reporting", subtitle = NULL)+labs(caption = "Source: CDC 2010") +   labs(fill='Rate%')  + coord_map("bonne",  param=45) 
#Drug Usage Map
map3<-ggplot(dataMap, aes(x=long, y=lat, group=group, fill=Drug.Use)) +
  geom_polygon(color = "black") +
  scale_fill_gradient2(low="blue", mid="grey88", high="magenta", midpoint=median(data$Drug.Use))+
  expand_limits(x= states_map$long, y=states_map$lat) +
  coord_map("polyconic")+ggtitle("Drug usage", subtitle = NULL)+labs(caption = "Source: U.S. Census Bureau,")+ labs(fill='Usage%')  +
  coord_map("bonne",  param=45) 

ggplotly(map_map)
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
## Warning: plotly.js does not (yet) support horizontal legend items 
## You can track progress here: 
## https://github.com/plotly/plotly.js/issues/53

From first glance, it is obvious that there is a pocket of states that have higher access to mental health resources such as counseling, have lower suicide rates. There is also evidence that supports the claim that depression is higher in the states that have higher drug usage, like in the Midwest. After looking at the mental care and suicide rate maps, it appears as if lower access to care leads to a higher suicide rate. The maps also show that as states with higher depression have better mental care resource access.

Comparing the relationships of all the variables:

# Multiple graphs on the same page
ggplot2.multiplot(map,map1,map2,map3, cols=2)


The relationship between Depression Rates, Drug Usage on Suicide Rates:

#Depression and Suicide
ds<-ggplot(data, aes(x= Suicide.Rate, y=Depression.Rates)) + 
  geom_point()+geom_smooth(method = "lm")+
  scale_x_continuous() + 
  ggtitle("Depression Rates on Suicide Rates") +
  xlab("Suicide Rates")+ ylab("Depression Rates") +theme_hc() +
  scale_colour_hc()

ggplotly(ds)
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`

This plot indicates that as depression rates increase, so do suicide rates. However, there is some missing data from 6 or so states which may be acting as outliers and skew the data downwards, lowering the overall depression rate.

ds1<-ggplot(dataDepresionMap, aes(x= Suicide.Rate, y=Depression.Rates, weight = Access.to.care)) + 
  geom_point()+geom_smooth(aes(weight=Access.to.care),method="lm", formula = y ~ x + I(x^2)) +
  scale_x_continuous() + 
  ggtitle("Depression Rates on Suicide Rates\nWeighted by Access to Mental Health Resources")+
  xlab("Suicide Rates")+ ylab("Depression Rates") + theme_hc() +
  scale_colour_hc()
ggplotly(ds1)
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`

Removing the states without data in the second plot helps show how the distribution’s range better, alluding to the evidence that as depression rates increase, so do suicide rates in most of the states, however, weighting this variable with their access to care shows that suicide rates do increase with depression, however, they don’t increase past a certain level of depression. Moreover, with poorer access to care, there is a higher suicide rate, regardless of overall depression rates.

Further Exploring the Relationship between Access to Care and Suicide Rate. Ranked by State.

#g <- ggplot(data, aes(Access.to.care, weight =Suicide.Rate, fill = state)) + geom_bar() + ggtitle("Access to Mental Health Resources\nby Suicide Rate in US States")+ xlab("Mental Health Resource Access (from easy to hard)") + ylab(" Suicide Rates") + guides(fill=FALSE) + coord_equal()
#ggplotly(g)

as<-ggplot(data, aes(x = reorder(state, -Suicide.Rate), y = Suicide.Rate, fill = state)) + 
  geom_bar(stat = "identity", position = position_dodge(width=2))  + 
  geom_text(aes(label=Access.to.care), vjust = .5, size = 2.7) + guides(fill=FALSE) + 
  ggtitle("Access to Mental Health Resources\nby Suicide Rate in US States", subtitle = "Ranked by suicide rate then ease of acess to mental care") + 
  xlab("Mental Health Resource Access (from easy to hard)") + 
  ylab(" Suicide Rates")  + coord_fixed()  + coord_flip() + theme_hc() +
  scale_colour_hc()
as

#ggplotly(as)

#ggplot(data, aes(x = reorder(state, -Suicide.Rate), y = Suicide.Rate, fill = state )) + 
  #geom_bar(stat = "identity")+ coord_flip() 

There is certainly a relationship between the ease of access to care and suicide rate, and it appears that the ease of access to mental health care does downplay the suicide rate, however, this relationship is not very strong. This is because the state of West Virginia is ranked 4th in care access, however they have the 10th highest Suicide rate.

#Drug Addiction and Suicide
das<-ggplot(data, aes(x= Suicide.Rate, y=Drug.Use)) + 
  geom_point()+geom_smooth(method = "lm")+
  scale_x_continuous(limit = c(5, 25))+ 
  ggtitle("Drug Usage by Suicide Rates")+
  xlab("Suicide Rates")+ ylab("Drug Usage Rates") + theme_hc() +
  scale_colour_hc()
ggplotly(das)
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
## Warning: Removed 3 rows containing non-finite values (stat_smooth).
das1<-ggplot(data, aes(x= Suicide.Rate, y=Drug.Use, size = Access.to.care)) + 
  geom_point()+geom_smooth(method = "lm")+
  scale_x_continuous(limit = c(5, 25))+ 
  ggtitle("Drug Usage by Suicide Rates\nWeighted by Access to Care")+
  xlab("Suicide Rates")+ ylab("Drug Usage Rates") + theme_hc() +
  scale_colour_hc()
ggplotly(das1)
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
## Warning: Removed 3 rows containing non-finite values (stat_smooth).

The graph illustrates the relationship of self-reported use of drugs, non-illicit, according to the data sources. It reveals that there is actually a negative relationship between suicide rates and the use of drugs. The distribution shows that at lower suicide rates, such as 7.5%, drug usage is at 45%, and as suicide rates increase, drug usage decreases. This could potentially be explained my over-medication in the united states, since the states using less drugs seem to have higher difficulties controlling suicide.

Regression Model

Dummy Variable is 1 if the suicide rate is greater than 15% per 100,000 thousand, labeled as “dum”.
reg<-data[which(data$Depression.Rates!="0"), ] #dropping missing observations. 
#regGreaterThan15<-reg[which(reg$dum!="0"), ] #testing for variables above .15 suicide rates
model <- miceadds::lm.cluster(data = reg, formula = Suicide.Rate ~ Depression.Rates + Drug.Use + Access.to.care, cluster = "dum", family = "binomial")
## Warning: In lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
##  extra argument 'family' will be disregarded
summary(model)
## R^2 = 0.18693 
## 
##                     Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)      19.97466570 0.99722503 20.030249 3.001535e-89
## Depression.Rates -0.65294524 0.08151683 -8.009944 1.147611e-15
## Drug.Use         -0.04526338 0.01046906 -4.323540 1.535452e-05
## Access.to.care    0.13930198 0.01153599 12.075428 1.424217e-33

    The model above shows that my independent variables are largely related tochanges towards the dependent variable of Suicide Rate, as all of the P-values were well below .05, some being as low as -11.8804698726. This is more than 10 times below the limits of .05, indicating that I should reject the null hypothesis that these variables, or that access to mental health resources, depression rates, and drug usage has no real effect increasing or decreasing the suicide rate (since results show that they do). The R-Squared value of 0.18693 indicates the regression model does holds some predictive properties of the variability in the DV.

The 3 variables coefficients show that there was very little change per each unit, yet still significant change.
    For each one unit increase to Suicide Rate, there was -.65 level in Depression Rates, indicating an inverse relationship. This is interesting as it would expect that as suicide rates increased, depression would follow. The Test Statistics shows

    For drug usage, there was a -.045 decrease for each unit of change in suicide rate, showing that as suicide rates increased, there was less reported use of drugs per state. This could be explained by the theory that these anti-depression medication works, and in states that have less use of this, have higher rates.

    For access to mental health care, there was actually a positive linear relationship, as for each increase in the DV, there was a .139 increase in access to care showing that states have seen the effects of the DV and have made strides to actually prevent futurue attempts by adding resources.

Final Points:

    Overall, there is a sizeable relationship between the variables of Depression, Drug usage, and Access to mental health recourses on the suicide rate per state. Depression rates seem to hold a positive linear relationship between it and despite there being a few outlier states, there is still a weak to a medium linear relationship between the two. The data shows that suicide rates hover between 11 and 20th percentile, while overall depression hovers from ~7% to 12%, and states that are in this depression range have varying rates of suicide. To further test for accuracy, controlling for access to mental health resources shows that as depression reaches a peak of 10%, then steadily declines as at upper tier suicide rates. The upper tier of suicide rates has much worse levels of access to care, with the highest states ranging from 20-50. Despite these figures, there are a few states that have excellent access to care, yet are still in the average suicide rate category.

    Mental health care resources also have a weak positive linear relationship with suicide rates, and in the figure above, I have ranked every state by its ease of access to mental health care, then by suicide frequency per 100,000 people. The results show that there is a decrease in suicide rate with the better care and there is an increase to access to care as the suicide rate increases, however, there are still some outlier states which have decent/good access to care, yet still higher rates. After a google search, a high ranking state of Alaska has 3 community health centers just in one small town, however, they are still the 10 highest on the suicide rate scale. The drug usage variable actually had a negative linear relationship and seemed to be another factor which helped lower suicide rate, perhaps it has to do with medication to help prevent depression, which in turn helps prevent suicide.     The drug usage variable has a negative linear relationship with the DV. Perhaps because of less drug usage for mental health illness such as depression being used, leading to higher suicide rates. Although the decline slope is marginal, there is still significance as this trend is observed overall
    The regression output shows that my predictions were largely correct, according to the data I used, there is a significant effect from all of these variables. There was a decently sizable effect caused by access to mental health resources, which revealed that more suicides, caused more mental care centers to sprout up.

Policy Suggestion:

    To conclude, it is safe to say that depression, drug usage and mental care all have impacts on suicide rates, and the US should seek to provide more resources to help people cope with these issues, which seems as the best plausible solutiton to stopping sucidie and other problems mental illness can cause and help save lives.